This version contains the source code. Please do not use the codes without permission from the authors.
The objective of this study is to cluster heroes based on their game impact and identify alternative heroes most similar to a banned hero in the event that an opposing team disrupts a team's key strategy by banning their key hero.
For more info about the DPC, you can visit https://www.dota2.com/procircuit.
*NOTE: `premium` matches are DPC games which are in essence, professional DotA 2 matches.
from IPython.display import HTML
HTML('''<script>
function code_toggle() {
if (code_shown){
$('div.input').hide('500');
$('#toggleButton').val('Show Code')
} else {
$('div.input').show('500');
$('#toggleButton').val('Hide Code')
}
code_shown = !code_shown
}
$( document ).ready(function(){
code_shown=false;
$('div.input').hide()
});
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>''')
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3
import requests
import json
from sklearn.metrics import calinski_harabaz_score, silhouette_score
from scipy.spatial.distance import euclidean, cityblock, cosine
from collections import Counter, OrderedDict
from wordcloud import WordCloud
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.cluster import KMeans
from math import pi
# Connecting to the database
conn = sqlite3.connect('dmwfinalproject.db')
c = conn.cursor()
# Select the dataframe from the database
df_ds = pd.read_sql('''SELECT * FROM dota2''', conn)
# Set hero_id as index for visualization
df_ds.set_index('hero_id').head()
print("The dataframe's shape is:", df_ds.shape)
| Features | Description |
|---|---|
| Categorical | |
| hero_id | The ID value of the hero played |
| match_id | The ID number of the match assigned by Valve |
| player_slot | Which slot the player is in. 0-127 are Radiant, 128-255 are Dire |
| Numerical | |
| ancient_kills | Total number of Ancient creeps killed by the player |
| assists | Number of assists the player had |
| camps_stacked | Number of camps stacked |
| courier_kills | Total number of courier kills the player had |
| creeps_stacked | Number of creeps stacked |
| deaths | Number of deaths |
| denies | Number of denies |
| gold_per_min | Gold Per Minute obtained by this player |
| gold_spent | How much gold the player spent |
| hero_damage | Hero Damage Dealt |
| hero_healing | Hero Healing Done |
| kda | Kill-Death-Assist ratio |
| kills | Number of kills |
| last_hits | Number of last hits |
| level | Level at the end of the game |
| neutral_kills | Total number of neutral creeps killed |
| obs_placed | Total number of observer wards placed |
| observer_kills | Total number of observer wards killed by the player |
| observer_uses | Number of observer wards used |
| roshan_kills | Total number of roshan kills (last hit on roshan) the player had |
| rune_pickups | Number of runes picked up |
| sen_placed | How many sentries were placed by the player |
| sentry_kills | Total number of sentry wards killed by the player |
| sentry_uses | Number of sentry wards used |
| stuns | Total stun duration of all stuns by the player |
| tower_damage | Total tower damage done by the player |
| tower_kills | Total number of tower kills the player had |
| xp_per_min | Experience Per Minute obtained by the player |
As mentioned in Section 3.2, the dataframe is composed of 15,260 rows with 31 columns which was scraped using the OpenDota API and the requests library. The API response was then parsed using the json library.
The actual code for scraping is shown below.
# Importing premium match ids needed from the retrieved .csv file
df_prem_mids = pd.read_csv('premium match ids.csv')
prem_mids = df_prem_mids['match_id'].tolist()
# Preliminaries
# Create a list of stats or features to be scraped
stat_list = ['match_id', 'player_slot', 'assists', 'camps_stacked',
'creeps_stacked', 'deaths', 'denies', 'gold_per_min',
'gold_spent', 'hero_damage', 'hero_healing', 'hero_id', 'kills',
'last_hits', 'level', 'obs_placed', 'rune_pickups', 'sen_placed',
'stuns', 'tower_damage', 'xp_per_min', 'kda', 'neutral_kills',
'tower_kills', 'courier_kills', 'observer_kills', 'sentry_kills',
'roshan_kills', 'ancient_kills', 'observer_uses', 'sentry_uses']
# Since, the OpenDota API returns hero_id as an integer, we create a hero id
# and hero name mapping
hero_id = {
1: "antimage",
2: "axe",
3: "bane",
4: "bloodseeker",
5: "crystal_maiden",
6: "drow_ranger",
7: "earthshaker",
8: "juggernaut",
9: "mirana",
10: "morphling",
11: "nevermore",
12: "phantom_lancer",
13: "puck",
14: "pudge",
15: "razor",
16: "sand_king",
17: "storm_spirit",
18: "sven",
19: "tiny",
20: "vengefulspirit",
21: "windrunner",
22: "zuus",
23: "kunkka",
25: "lina",
26: "lion",
27: "shadow_shaman",
28: "slardar",
29: "tidehunter",
30: "witch_doctor",
31: "lich",
32: "riki",
33: "enigma",
34: "tinker",
35: "sniper",
36: "necrolyte",
37: "warlock",
38: "beastmaster",
39: "queenofpain",
40: "venomancer",
41: "faceless_void",
42: "skeleton_king",
43: "death_prophet",
44: "phantom_assassin",
45: "pugna",
46: "templar_assassin",
47: "viper",
48: "luna",
49: "dragon_knight",
50: "dazzle",
51: "rattletrap",
52: "leshrac",
53: "furion",
54: "life_stealer",
55: "dark_seer",
56: "clinkz",
57: "omniknight",
58: "enchantress",
59: "huskar",
60: "night_stalker",
61: "broodmother",
62: "bounty_hunter",
63: "weaver",
64: "jakiro",
65: "batrider",
66: "chen",
67: "spectre",
69: "doom_bringer",
68: "ancient_apparition",
70: "ursa",
71: "spirit_breaker",
72: "gyrocopter",
73: "alchemist",
74: "invoker",
75: "silencer",
76: "obsidian_destroyer",
77: "lycan",
78: "brewmaster",
79: "shadow_demon",
80: "lone_druid",
81: "chaos_knight",
82: "meepo",
83: "treant",
84: "ogre_magi",
85: "undying",
86: "rubick",
87: "disruptor",
88: "nyx_assassin",
89: "naga_siren",
90: "keeper_of_the_light",
91: "wisp",
92: "visage",
93: "slark",
94: "medusa",
95: "troll_warlord",
96: "centaur",
97: "magnataur",
98: "shredder",
99: "bristleback",
100: "tusk",
101: "skywrath_mage",
102: "abaddon",
103: "elder_titan",
104: "legion_commander",
105: "techies",
106: "ember_spirit",
107: "earth_spirit",
108: "abyssal_underlord",
109: "terrorblade",
110: "phoenix",
111: "oracle",
112: "winter_wyvern",
113: "arc_warden",
114: "monkey_king",
119: "dark_willow",
120: "pangolier",
121: "grimstroke",
129: "mars"
}
# Scraping identified stats or features per match id and saving to a dataframe
# to_df = []
# for mid in prem_mids:
# url = 'https://api.opendota.com/api/matches/' + str(mid) + '?api_key'
# resp = requests.get(url)
# # Converting resp.text to json_file
# json_file = json.loads(resp.text)
# try:
# for i in range(len(json_file['players'])):
# temp_dict = {}
# for key in stat_list:
# try:
# if key == 'hero_id': # Convert hero_id to hero name
# temp_dict[key] = hero_id[json_file['players']
# [i]['hero_id']]
# else:
# temp_dict[key] = json_file['players'][i][key]
# except:
# temp_dict[key] = np.nan
# to_df.append(temp_dict)
# except:
# continue
For replication of the study and easier reanalysis, the data was stored in a database named dmwfinalproject.
# Storing dataframe to the database
# df = pd.DataFrame(to_df)
# df.to_sql('dota2', conn, if_exists='replace', index=False)
The features selected are the following:
# Connecting to the database
conn = sqlite3.connect('dmwfinalproject.db')
c = conn.cursor()
# Importing the data from the table named dota2
df_from_sql = pd.read_sql('''SELECT * FROM dota2''', conn)
# Creating a hero name mapping, to be used later after clustering
df_target = df_from_sql['hero_id']
# Identifying features to drop
to_drop = ['courier_kills', 'creeps_stacked', 'assists', 'deaths',
'gold_spent', 'hero_id', 'kills', 'last_hits', 'level', 'match_id',
'observer_kills', 'observer_uses', 'player_slot', 'roshan_kills',
'sentry_kills', 'sentry_uses', 'tower_kills']
# Dropping unnecessary features
df_dropped = df_from_sql.drop(columns=to_drop)
df_dropped = df_dropped.replace(np.nan, 0.0)
df_dropped['hero_id'] = df_target
After feature selection and data cleaning , here is how the working data looks like.
df_dropped.head()
Shown below are the descriptive statistics of each feature.
descr_stat = []
for i in df_dropped.columns[:-1]:
descr_stat.append(df_dropped.loc[:, i].describe())
df_descr_stat = pd.DataFrame(descr_stat)
df_descr_stat
Before proceeding with the clustering proper, it is important to take a glance at what the data could provide us so that we would know something to expect in the clustering. Below is a horizontal bar plot of the top 10 frequently picked heroes for the recent DPC.
hero_count = Counter(df_dropped['hero_id']).most_common()
heroes = [i[0] for i in hero_count][:10][::-1]
count = [i[1] for i in hero_count][:10][::-1]
fig, ax = plt.subplots(figsize=(10,7))
ax.barh(heroes, count);
ax.set_title('Top 10 Picked Heroes for the recent DPC')
ax.set_xlabel('Count')
ax.set_ylabel('Hero')
print(hero_count[:10])
# Plotting distributions of each feature
fig, ax = plt.subplots(4, 4, figsize=(20,15))
col = 0
for i in range(4):
for k in range(4):
try:
sns.distplot(df_dropped.iloc[:, col], ax=ax[i,k])
col += 1
except:
pass
fig.suptitle('Distribution Plot of the Features Selected', fontsize=15)
plt.savefig('distribution.png', transparent=True)
# Pairplot of each feature
g = sns.pairplot(df_dropped)
g.fig.suptitle("Pairplot of Features", fontsize=15, y=1.08);
# Normalizing df_dropped
standard_scaler = StandardScaler()
df_dropped_normed = standard_scaler.fit_transform(df_dropped.iloc[:,:-1])
# Creating a mapping of feature names per column since normalizing removes the
# column names
feat_map = {}
for k, v in enumerate(df_dropped.iloc[:,:-1].columns.to_list()):
feat_map[k] = v
# Reassigning column names to df_dropped_normed
df_dropped_normed = pd.DataFrame(df_dropped_normed).rename(columns=feat_map)
df_dropped_normed.head()
# Plotting distributions of each feature
fig, ax = plt.subplots(4, 4, figsize=(20,15))
col = 0
for i in range(4):
for k in range(4):
try:
sns.distplot(df_dropped.iloc[:, col], ax=ax[i,k])
col += 1
except:
pass
fig.suptitle('Distribution Plot of the Features Selected', fontsize=15)
plt.savefig('distribution.png', transparent=True)
# Plotting normalized distributions of each feature
fig, ax = plt.subplots(4, 4, figsize=(20,15))
col = 0
for i in range(4):
for k in range(4):
try:
sns.distplot(df_dropped_normed.iloc[:, col], ax=ax[i,k])
col += 1
except:
pass
fig.suptitle('Normalized Distribution Plot of the Features Selected', fontsize=15);
As seen from the two distribution plots above, the distribution (shape) remained but the mean and standard deviations were set to zero (0) and one (1) respectively.
# Creating a new copy of df_dropped with no normalization to append cluster
# labels later
df_dropped_clustered = df_dropped.copy()
To determine the optimal number of clusters that best fits the data, internal validation criteria were used. Below are the internal validations used to evaluate the clusters formed using K-Means.
Below are the functions that we used in performing the internal validation criteria.
# Defining Intra to Inter cluster ratio function
def intra_to_inter(X, y, dist, r):
"""Compute intracluster to intercluster distance ratio
Parameters
----------
X : array
Design matrix with each row corresponding to a point
y : array
Class label of each point
dist : callable
Distance between two points. It should accept two arrays, each
corresponding to the coordinates of each point
r : integer
Number of pairs to sample
Returns
-------
ratio : float
Intracluster to intercluster distance ratio
"""
P = []
Q = []
for i in range(r):
z = np.random.randint(0, len(X), 2)
if z[0] == z[1]:
continue
elif y[z[0]] == y[z[1]]:
P.append(dist(X[z[0]], X[z[1]]))
else:
Q.append(dist(X[z[0]], X[z[1]]))
return (np.average(P) / np.average(Q))
# Defining a function to plot each internal validation criteria
def plot_internal(inertias, chs, scs):
"""Plot internal validation values"""
fig, ax = plt.subplots(2,2, figsize=(15,10))
ks = np.arange(2, len(inertias)+2)
fig.suptitle('Internal Validation Criteria Values', fontsize=15)
ax[0,0].plot(ks, inertias, '-o', label='SSE (Lower is better)')
ax[0,0].axvline(3, ls='--', c='k')
ax[0,1].plot(ks, chs, '-ro', label='CH (Higher is better)')
# ax[0,1].axvline(3, ls='--', c='k')
ax[0,0].set_xlabel('$k$')
ax[0,0].set_title('SSE (Lower is better)')
ax[0,1].set_xlabel('$k$')
ax[0,1].set_title('CH (Higher is better)')
# ax[1,0].plot(ks, iidrs, '-go', label='Inter-intra (Lower is better)')
# ax[1,0].axvline(5, ls='--', c='k')
ax[1,1].plot(ks, scs, '-ko', label='Silhouette coefficient (Higher is better)')
# ax[1,0].set_title('Inter-Intra (Lower is better)')
# ax[1,0].set_xlabel('$k$')
ax[1,1].set_title('Silhouette (Higher is better)')
ax[1,1].set_xlabel('$k$')
return fig
def cluster_range(X, clusterer, k_start, k_stop):
"""Compute cluster range from the given design matrix
Parameters
----------
X : array
Design matrix with each row corresponding to a point
clusterer :
the clustering object
k_start :integer
starting number of clusters
k_stop : integer
ending number of clusters
Returns
-------
validation : dictionary
Dictionary of cluster labels, internal validation values and
actual labels
"""
validation = {'ys': [],
'inertias': [],
'chs': [],
'iidrs': [],
'scs': []}
for k in list(range(k_start, k_stop+1)):
np.random.seed(11)
kmeans_X = KMeans(
n_clusters=k, random_state=clusterer.random_state)
y_predict_X = kmeans_X.fit_predict(X)
validation['ys'] += [y_predict_X]
validation['inertias'] += [kmeans_X.inertia_]
validation['chs'] += [calinski_harabaz_score(X, y_predict_X)]
validation['iidrs'] += [intra_to_inter(X, y_predict_X,
euclidean, 50)]
validation['scs'] += [silhouette_score(X, y_predict_X)]
return validation# Creating a cluster range for various k
def cluster_range(X, clusterer, k_start, k_stop):
"""
Returns a dictionary of the cluster labels and internal validation values
Parameters
----------
X : design matrix
clusterer : clustering object
k_start : int
Initial k
k_stop : int
Final k
Returns
-------
validation : dict
Contains the cluster labels and internal validation values
"""
validation = {'ys': [],
'inertias': [],
'chs': [],
'iidrs': [],
'scs': []}
for k in range(k_start, k_stop+1):
np.random.seed(11)
kmeans_X = KMeans(
n_clusters=k, random_state=clusterer.random_state)
y_predict_X = kmeans_X.fit_predict(X)
validation['ys'] += [y_predict_X]
validation['inertias'] += [kmeans_X.inertia_]
validation['chs'] += [calinski_harabaz_score(X, y_predict_X)]
# validation['iidrs'] += [intra_to_inter(X, y_predict_X,
# euclidean, 50)]
validation['scs'] += [silhouette_score(X, y_predict_X)]
return validation
# Calculating internal validation criteria for various k
res_X_normed = cluster_range(df_dropped_normed, KMeans(random_state=1337), 2, 11)
# Plotting internal validation criteria for various k
plot_internal(res_X_normed['inertias'], res_X_normed['chs'],
res_X_normed['scs']);
# Running k-means
k = 3
k_means = KMeans(n_clusters=k, random_state = 1337)\
.fit(df_dropped_normed)
df_dropped_clustered['Cluster'] = k_means.labels_
cluster_list = {}
for i in range(k):
cluster_list[i] = df_dropped_clustered[df_dropped_clustered['Cluster']==i]
# Assign each cluster to a dataframe variable
cluster_0 = cluster_list[0]
cluster_1 = cluster_list[1]
cluster_2 = cluster_list[2]
# Creating most important features mapper per cluster
terms = df_dropped_clustered.columns
feat_list = []
for i in range(k):
feat_list_per_cluster = []
for ind in k_means.cluster_centers_.argsort()[:, ::-1][i, :]:
feat_list_per_cluster.append(terms[ind])
feat_list.append(feat_list_per_cluster)
weight_list = np.sort(k_means.cluster_centers_)[:,::-1].tolist()
fw_fin = []
for i in range(len(feat_list)):
fw = []
for feat, weight in zip(feat_list[i], weight_list[i]):
fw.append((feat, weight))
fw_fin.append(fw)
After clustering each hero observation based on its game impact, the researchers aggregated each hero observation per cluster by calculating the mean values of the features for each unique hero. Additionally, the most important features for each cluster were identified using the k_means.cluster_centers_ function.
# Aggregating each cluster per hero
cluster0_agg = cluster_0.groupby('hero_id').mean() # Offlane - Disabler / Healer - Position 3/4 - Utility
cluster1_agg = cluster_1.groupby('hero_id').mean() # Support - Position 5
cluster2_agg = cluster_2.groupby('hero_id').mean() # Carry/Mid - Position 1/2
# Reindexed columns according to feature importance
cluster0_agg_re = cluster0_agg.reindex(columns=feat_list[0])
cluster1_agg_re = cluster1_agg.reindex(columns=feat_list[1])
cluster2_agg_re = cluster2_agg.reindex(columns=feat_list[2])
The most important features for each cluster are the following. For Cluster 0 -- stuns, hero_healing, camps_stacked; For Cluster 1 -- obs_placed, sen_placed, hero_healing; For Cluster 2 -- gold_per_min, neutral_kills, xp_per_min.
Below is the complete list of features per cluster ordered by decreasing importance.
print('Cluster 0 most important features: \n', cluster0_agg_re.columns.tolist())
print('\nCluster 1 most important features: \n', cluster1_agg_re.columns.tolist())
print('\nCluster 2 most important features: \n', cluster2_agg_re.columns.tolist())
Looking at the list of features above and the distributions of each cluster plotted per feature below, the researchers can then define the themes of each cluster formed. The three (3) clusters were named as Utility (Cluster 0), Support (Cluster 1) and Core (Cluster 2).
Based on the plots below, majority of the distribution plots are very telling of each cluster's characteristics:
# Plotting distributions of features for each cluster
fig, ax = plt.subplots(4, 4, figsize=(20,15))
#Row 1
sns.distplot(cluster_0.loc[:, 'stuns'], ax = ax[0,0], label='Utility')
sns.distplot(cluster_1.loc[:, 'stuns'], ax = ax[0,0], label='Support')
sns.distplot(cluster_2.loc[:, 'stuns'], ax = ax[0,0], label='Core')
ax[0,0].legend()
sns.distplot(cluster_0.loc[:, 'hero_healing'], ax = ax[0,1], label='Utility')
sns.distplot(cluster_1.loc[:, 'hero_healing'], ax = ax[0,1], label='Support')
sns.distplot(cluster_2.loc[:, 'hero_healing'], ax = ax[0,1], label='Core')
ax[0,1].legend()
sns.distplot(cluster_0.loc[:, 'camps_stacked'], ax = ax[0,2], label='Utility')
sns.distplot(cluster_1.loc[:, 'camps_stacked'], ax = ax[0,2], label='Support')
sns.distplot(cluster_2.loc[:, 'camps_stacked'], ax = ax[0,2], label='Core')
ax[0,2].legend()
sns.distplot(cluster_0.loc[:, 'rune_pickups'], ax = ax[0,3], label='Utility')
sns.distplot(cluster_1.loc[:, 'rune_pickups'], ax = ax[0,3], label='Support')
sns.distplot(cluster_2.loc[:, 'rune_pickups'], ax = ax[0,3], label='Core')
ax[0,3].legend()
#Row 2
sns.distplot(cluster_0.loc[:, 'denies'], ax = ax[1,0], label='Utility')
sns.distplot(cluster_1.loc[:, 'denies'], ax = ax[1,0], label='Support')
sns.distplot(cluster_2.loc[:, 'denies'], ax = ax[1,0], label='Core')
ax[1,0].legend()
sns.distplot(cluster_0.loc[:, 'kda'], ax = ax[1,1], label='Utility')
sns.distplot(cluster_1.loc[:, 'kda'], ax = ax[1,1], label='Support')
sns.distplot(cluster_2.loc[:, 'kda'], ax = ax[1,1], label='Core')
ax[1,1].legend()
sns.distplot(cluster_0.loc[:, 'xp_per_min'], ax = ax[1,2], label='Utility')
sns.distplot(cluster_1.loc[:, 'xp_per_min'], ax = ax[1,2], label='Support')
sns.distplot(cluster_2.loc[:, 'xp_per_min'], ax = ax[1,2], label='Core')
ax[1,2].legend()
sns.distplot(cluster_0.loc[:, 'hero_damage'], ax = ax[1,3], label='Utility')
sns.distplot(cluster_1.loc[:, 'hero_damage'], ax = ax[1,3], label='Support')
sns.distplot(cluster_2.loc[:, 'hero_damage'], ax = ax[1,3], label='Core')
ax[1,3].legend()
#Row 3
sns.distplot(cluster_0.loc[:, 'gold_per_min'], ax = ax[2,0], label='Utility')
sns.distplot(cluster_1.loc[:, 'gold_per_min'], ax = ax[2,0], label='Support')
sns.distplot(cluster_2.loc[:, 'gold_per_min'], ax = ax[2,0], label='Core')
ax[2,0].legend()
sns.distplot(cluster_0.loc[:, 'neutral_kills'], ax = ax[2,1], label='Utility')
sns.distplot(cluster_1.loc[:, 'neutral_kills'], ax = ax[2,1], label='Support')
sns.distplot(cluster_2.loc[:, 'neutral_kills'], ax = ax[2,1], label='Core')
ax[2,1].legend()
sns.distplot(cluster_0.loc[:, 'ancient_kills'], ax = ax[2,2], label='Utility')
sns.distplot(cluster_1.loc[:, 'ancient_kills'], ax = ax[2,2], label='Support')
sns.distplot(cluster_2.loc[:, 'ancient_kills'], ax = ax[2,2], label='Core')
ax[2,2].legend()
sns.distplot(cluster_0.loc[:, 'tower_damage'], ax = ax[2,3], label='Utility')
sns.distplot(cluster_1.loc[:, 'tower_damage'], ax = ax[2,3], label='Support')
sns.distplot(cluster_2.loc[:, 'tower_damage'], ax = ax[2,3], label='Core')
ax[2,3].legend()
#Row 4
sns.distplot(cluster_0.loc[:, 'sen_placed'], ax = ax[3,1], label='Utility')
sns.distplot(cluster_1.loc[:, 'sen_placed'], ax = ax[3,1], label='Support')
sns.distplot(cluster_2.loc[:, 'sen_placed'], ax = ax[3,1], label='Core')
ax[3,1].legend()
sns.distplot(cluster_0.loc[:, 'obs_placed'], ax = ax[3,2], label='Utility')
sns.distplot(cluster_1.loc[:, 'obs_placed'], ax = ax[3,2], label='Support')
sns.distplot(cluster_2.loc[:, 'obs_placed'], ax = ax[3,2], label='Core')
ax[3,2].legend()
fig.suptitle('Distributions of Each Feature per Cluster', fontsize=15);
To further visualize the most important features of each cluster, the researchers plotted a radar plot with feature weights as values seen below. The radar plot is consistent with the list of features and distribution plots discussed above.
# Creating a radar dataframe
radar = {'cluster' : ['0 (Utility)', '1 (Support)', '2 (Core)']}
for i in range(len(fw_fin)):
for x in range(len(fw_fin[i])):
if fw_fin[i][x][0] not in radar.keys():
radar[fw_fin[i][x][0]] = [fw_fin[i][x][1]]
else:
radar[fw_fin[i][x][0]].append(fw_fin[i][x][1])
df_radar = pd.DataFrame(radar)
def make_spider( row, title, color):
# number of variable
categories=list(df_radar)[1:]
N = len(categories)
# What will be the angle of each axis in the plot? (we divide the plot / number of variable)
angles = [n / float(N) * 2 * pi for n in range(N)]
angles += angles[:1]
# Initialise the spider plot
ax = plt.subplot(1,3,row+1, polar=True, )
# If you want the first axis to be on top:
ax.set_theta_offset(pi / 2)
ax.set_theta_direction(-1)
# Draw one axe per variable + add labels labels yet
plt.xticks(angles[:-1], categories, color='grey', size=8)
# Draw ylabels
ax.set_rlabel_position(0)
plt.yticks([-2, -1, 0, 1], ["-2", "-1", "0","1","2"], color="grey", size=7)
plt.ylim(-2,2)
# Ind1
values=df_radar.loc[row].drop('cluster').values.flatten().tolist()
values += values[:1]
ax.plot(angles, values, color=color, linewidth=2, linestyle='solid')
ax.fill(angles, values, color=color, alpha=0.4)
# Add a title
plt.title(title, size=11, color=color, y=1.1)
# ------- PART 2: Apply to all individuals
# initialize the figure
my_dpi=150
plt.figure(figsize=(25,15), dpi=my_dpi)
# Create a color palette:
my_palette = plt.cm.get_cmap("Set2", len(df_radar.index))
# Loop to plot
for row in range(0, len(df_radar.index)):
make_spider( row=row, title='Cluster '+df_radar['cluster'][row], color=my_palette(row))
The Utility cluster contains a total of 117 heroes. The Support cluster contains a total of 60 heroes. While the Core cluster contains a total of 91 heroes.
A sample of the actual heroes included in each cluster is given below (limited to five (5) heroes only).
print('Utility:')
print(np.random.choice(list(set(cluster_0['hero_id'])), 5))
print('Support:')
print(np.random.choice(list(set(cluster_1['hero_id'])), 5))
print('Core:')
print(np.random.choice(list(set(cluster_2['hero_id'])), 5))
Now that we have clustered our heroes, we can do the second part of the analysis which is to recommend alternative picks for the said hero. By calculating the Euclidean distance between heroes on the same cluster/feature space, we can select the k most similar heroes to the banned hero.
# Object retrieval function
def nearest_k(df, query, objects, k, dist):
"""Return the indices to objects most similar to query
Parameters
----------
df : dataframe
Dataframe to retrieve target names from
query : ndarray
query object represented in the same form vector representation as the
objects
objects : ndarray
vector-represented objects in the database; rows correspond to
objects, columns correspond to features
k : int
number of most similar objects to return
dist : function
accepts two ndarrays as parameters then returns their distance
Returns
-------
most_similar : ndarray
Indices to the most similar objects in the database
"""
indices = np.argsort([dist(i, query) for i in objects])[1:k+1]
return df.iloc[indices, :].index.tolist()
# Creating a query function
def findsimilarhero(query, cluster_number, k=3):
'''Returns a dataframe of k most similar hero from the query using
euclidean distance
Parameters
----------
query : str
hero name to query
cluster_number : str
cluster to query in
k : int
number of similar heroes to return
Returns
-------
df : dataframe
dataframe of k most similar hero from the query
'''
try:
if cluster_number == 'utility':
cluster_number = cluster0_agg_re
elif cluster_number == 'support':
cluster_number = cluster1_agg_re
elif cluster_number == 'core':
cluster_number = cluster2_agg_re
else:
return print('No cluster', cluster_number, 'found!')
return nearest_k(cluster_number, cluster_number.loc[query, :].values, cluster_number.values, k, euclidean)
except:
return print('This hero is not played for this role!')
For example, a team wanted to pick a support Crystal Maiden. However, the opposing team banned it. We then determine the most similar heroes with a sample query shown below.
# Sample query
#(Input: 'Banned Hero Name', 'Supposed Role', 'No. of similar heroes')
print('Querying for an alternative to a support Crystal Maiden:')
findsimilarhero('crystal_maiden', 'support', k=4)
As another example, suppose a team wanted to play Abaddon as a support --
# Sample query
#(Input: 'Banned Hero Name', 'Supposed Role', 'No. of similar heroes')
print('Querying for an alternative to a support Abaddon:')
findsimilarhero('abaddon', 'support', k=4)
To show the purpose of clustering, suppose that the team wanted to play abaddon as a core --
# Sample query
#(Input: 'Banned Hero Name', 'Supposed Role', 'No. of similar heroes')
print('Querying for an alternative to a core Abaddon:')
findsimilarhero('abaddon', 'core', k=4)
The resulting similar heroes are fair alternatives for the queried heroes during actual DotA 2 games.
The researchers were able to identify three (3) clusters based on a hero's game impact. The formed clusters and its general characteristics are the following:
Additionally, the researchers were also successful in providing a system that can determine an alternative to a banned hero. Based on the researchers' domain knowledge, the resulting similar heroes are fair alternatives to a queried hero during actual DotA 2 games.
For DotA 2 players, the Core cluster can be defined as Position 1 and 2; Utility as Position 3 and 4; and the Supports as Position 5.
For professional teams, apart from using this system to provide quick alternatives to a banned hero, it can also be used for theorycrafting alternative heroes to widen the pocket strategies of the team.
The researchers recommend the following to further optimize the system:
To complete the study, the researchers used the following resources as reference:
[1] Modes, G. (2019). Game modes. Retrieved 3 June 2019, from https://dota2.gamepedia.com/Game_modes#Captains_Mode
[2] Dota 2 Statistics. (n.d.). Retrieved 3 June 2019, from https://www.opendota.com/explorer
[3] DotA 2 Guide (n.d.). Retrieved 3 June 2019, from https://purgegamers.true.io/g/dota-2-guide/
[4] Use Faceting for Radar Chart (n.d.). Retrieved 23 July 2019, from https://python-graph-gallery.com/392-use-faceting-for-radar-chart/
In addition to the references used in the study, the researchers would like to acknowledge Prof. Christian Alis, PhD, Prof. Erika Legara, PhD and Prof. Eduardo David, Jr. for mentoring us throughout the course and imparting their knowledge in our journey to become a Data Scientist.